Flow Matching For Generative Modeling
This is a learning note from this series of videos.
Link to the paper: https://arxiv.org/abs/2210.02747
Supplementary papers:
Step-by-Step Diffusion: An Elementary Tutorial
Glow: Generative Flow with Invertible 1x1 Convolutions
1. What is Flow? Understand vector field and ODE.
Generative Modeling: With known samples , we want to estimate this unknown distribution.
Method: . This mapping is denoted as .
How to solve for : 1. Normalizing Flow; 2. Flow Matching (ODE)
Keypoints of Flow Matching:
- Let and be the initial and final points of ODE.
- Use neural networks to fit the gradient term in ODE.
- Solve the ODE.
Preliminaries
ODE → Flow → Normalizing Flow → Continuous Normalizing Flow
Flow
Definition 1. (Flow): A flow is a collection of time-indexed vector fields .
Any flow defines a trajectory taking initial points to final points , by transporting the initial point along the velocity fields . It is equivalent to a transfer between two distributions.
Formally, for velocity field and initial point , consider the ODE
with initial condition at time . We write
to denote the solution to the flow ODE at time , terminating at final point . That is, RunFlow is the result of transporting point along the flow up to time .
Flows also define transports between entire distributions by pushing forward points from the source distribution along their trajectories. If is a distribution on initial points, then applying the flow yields the distribution on final points:
This process is denoted as , meaning the flow transports initial distribution to final distribution .
THE ULTIMATE GOAL OF FLOW MATCHING is to learn a velocity field which transport , where is the target distribution and is some easy-to-sample base distribution (such as Gaussian).
Continuous Normalizing Flows
Let denote the data space with data points .
The probability density path is a time-dependent probability density function, i.e., .
is a time-dependent vector field.
A vector field can be used to construct a time-dependent diffeomorphic map, called a flow, , defined via the ordinary differential equation (ODE):
Here is a solution to the ODE, and we call it a flow. We can model the vector field with a neural network , where are its learnable parameters.
2. The Continuity Equation and Fokker-Planck Equation
Continuity Equation
How to test if a vector field generates a probability path ?
Conditional VFs for Fokker-Planck probability paths
Consider a Stochastic Differential Equation (SDE) of the standard form:
with time parameter , drift , diffusion coefficient , and is the Wiener process.
The solution to the SDE is , which is a stochastic process (a continuous time-dependent variable). Its probability density is characterized by the Fokker-Planck equation:
where represents the Laplace operator (in ), namely , where is the gradient operator. We can rewrite the above equation in the form of the continuity equation:
where the vector field
satisfies the continuity equation with the probability path , and therefore generates .
3. Continuous Normalizing Flow (CNF)
A CNF is used to reshape a simple prior density (e.g., pure noise) to a more complicated one, , via the push-forward equation.
The push-forward (or change of variable) operator is defined by:
Change of variables in the probability density function
Suppose is an n-dimensional random variable with joint density . If , where is a bijective, differentiable function, then has density :
Relationships between Flow, Vector Field, and Probability Density
- Equation (2)(3) describe the relationship between flow and vector field
- Equation (8)(9) describe how the flow changes the density from to .
- Equation (4) (continuity equation) gives a necessary and sufficient condition to test whether a vector field generates a probability path .